COMS 6998 - 4 Fall 2017 Presenter : Yuemei Zhang

نویسندگان

Yuemei Zhang

Che Shen

چکیده

In recent years crowdsourcing has become the method of choice for gathering labeled training data for learning algorithms. However, in most cases, there are no known computationally efficient learning algorithms that are robust to the high level of noise that exists in crowdsourced data, and efforts to eliminate noise through voting often require a large number of queries per example. In this note we will introduce a computationally efficient algorithm with much less overhead in the labeling cost. In particular, we mainly consider the case when a noticeable fraction of labelers are perfect, and the rest behave arbitrary. we show that any hypothesis space F that can be efficiently learned in the traditional realizable PAC model can be learned in a computationally efficient manner by querying the crowd, despite high amounts of noise in the responses.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

COMS 6998 - 4 Fall 2017 Presenter : Geelon So

In the setting of active learning, the data comes unlabeled and querying the label of a data point is expensive. The goal of an active learner is to reduce the number of labels needed and output a hypothesis with error rate ≤ . Recall that the usual sample complexity of supervised learning is Ω(1/ ). The motivation for defining splitting index is to characterize the sample complexity of active ...

متن کامل

COMS 6998 - 4 Fall 2017 Presenter : Wenxi Chen

This lecture is delivered in more philosophical sense rather than technically. In the past, we have learned a series of algorithms which can dig deeper in a training dataset and generate a model based on such a dataset. However, without knowing of what knowledge is contained in the model, it is generally hard for human beings to trust the trained model and apply it in reality. Sometimes, biased...

متن کامل

COMS 6998 - 4 Fall 2017 Presenter : Daniel Hsu

+ errPn(hn)− errPn(h) + errPn(h ∗)− errP (h∗). The second part is less than or equals to 0. We can disregard it when we aim at deriving an upper bound of the regret. Since the target function h∗ is independent of the sample pairs, the third part can be bounded easily by analyzing the binomial distribution with success probability errP (h∗) and n trials. To analyze the remaining first part, we b...

متن کامل

COMS 6998 - 4 Fall 2017 Presenter : Yanlin

In former lectures, we have learned a lot about online learning. The basic idea is to keep a subset of hypothesis space as the version space and reduce the version space by new data or queries. And we consider the data in any arbitrary form, which means we don’t have any specific hypothesis on data’s schema itself. Although it can be generalized easily, still we want to make a practical effort ...

متن کامل